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Abstract 

We  investigate  the  diameter  problem  in  the  streaming  and  sliding-window  models.  We 
show  that,  for  a  stream  of  n  points  or  a  sliding  window  of  size  n,  any  exact  algorithm  for 
diameter  requires  f 2(n)  bits  of  space.  We  present  a  simple  e-approximation1  algorithm  for 
computing  the  diameter  in  the  streaming  model.  Our  main  result  is  an  e-approximation 
algorithm  that  maintains  the  diameter  in  two  dimensions  in  the  sliding  windows  model  using 
log3  n(log  R  +  log  log  n  +  log  -))  bits  of  space,  where  R  is  the  maximum,  over  all 
windows,  of  the  ratio  of  the  diameter  to  the  minimum  non-zero  distance  between  any  two 
points  in  the  window. 


1  introduction 

In  recent  years,  massive  data  sets  have  become  increasingly  important  in  a  wide  range  of  ap¬ 
plications.  In  many  applications,  the  input  can  be  viewed  as  a  data  stream  [12,  7]  that  the 
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1  Denote  by  A  the  output  of  an  algorithm  and  by  T  the  value  of  the  function  that  the  algorithm  wants  to 
compute.  We  say  A  e-approximates  T  if  (1  +  t)T  >  A  >  (1  —  e)T. 
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algorithm  reads  in  one  pass.  The  algorithm  should  take  little  time  to  process  each  data  element 
and  should  use  little  space  in  comparison  to  the  input  size. 

In  some  scenarios,  the  input  stream  may  be  infinite,  and  the  application  may  only  care  about 
recent  data.  In  this  case,  the  sliding-window  model  [6]  is  more  appropriate.  As  in  the  streaming 
model,  a  sliding-window  algorithm  should  go  through  the  input  stream  once,  and  there  is  not 
enough  storage  space  for  all  the  data,  even  for  the  data  in  the  window. 

In  this  paper,  we  investigate  the  two  dimensional  diameter  problem  in  these  two  models.  Given 
a  set  of  points  P.  the  diameter  is  the  maximum,  over  all  pairs  x,  y  in  P,  of  the  distance  between 
x  and  y.  There  are  efficient  algorithms  to  compute  the  exact  diameter  [5,  16]  or  to  approximate 
the  diameter  [1,  3,  4],  However,  little  has  been  done  for  computational  geometry  problems  in  the 
streaming  or  sliding-window  models.  In  particular,  little  is  known  about  the  diameter  problem 
in  these  two  models. 

We  show  that  computing  the  exact  diameter  for  a  set  of  n  points  in  the  streaming  model 
or  maintaining  it  in  the  sliding-window  model  (with  window-width  n)  requires  H(n)  bits  of 
space.  However,  when  approximation  is  allowed,  we  present  a  simple  e-approximation  algorithm 
in  the  streaming  model  that  uses  0(l/e)  space  and  processes  each  point  in  0(1)  time.  We 
also  present  an  approximate  sliding-window  algorithm  to  maintain  the  diameter  in  2-d  using 
0(^372  log3  n(log  R  +  log  log  n  +  log  |))  bits  of  space. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2,  we  briefly  introduce  the  streaming 
and  sliding-window  models.  In  Section  3,  we  present  our  streaming  diameter-approximation 
algorithm,  and,  in  Section  4,  we  present  our  sliding-window  diameter-approximation  algorithm. 
Section  5  shows  lower  bounds  for  the  exact  diameter  problem  in  both  models.  We  also  discuss 
space  requirements  for  approximation  in  this  section.  Section  6  concludes  this  paper. 


2  models  and  related  work 

The  streaming  model  was  introduced  in  [12,  7].  A  data  stream  is  a  sequence  of  data  elements 
a\,  02,  ■  ■  • ,  an.  We  will  denote  by  n  the  number  of  data  elements  in  the  stream.  In  this  paper, 
the  data  elements  are  points. 

A  streaming  algorithm  is  an  algorithm  that  computes  some  function  over  a  data  stream  and  has 
the  following  properties: 

1.  The  input  data  are  accessed  in  sequential  order. 

2.  The  order  of  the  data  elements  in  the  stream  is  not  controlled  by  the  algorithm. 

The  sliding- window  model  was  introduced  in  [6].  In  this  model,  one  is  only  interested  in  the  n 
most  recent  data  elements.  Suppose  a*  is  the  current  data  element.  The  window  then  consists 
of  elements  {aj_n+i,  ai-n+2,  ■  ■  ■ ,  a*}-  When  new  elements  arrive,  old  elements  are  aged  out. 
A  sliding-window  algorithm  is  an  algorithm  that  computes  some  function  over  the  window  for 
each  time  instant.  Note  that  the  window  is  a  subset  of  contiguous  data  elements  of  the  stream. 
Properties  (1)  and  (2)  above  hold  in  the  sliding-window  model  as  well  as  the  streaming  model. 
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Figure  1:  Two  Examples  of  Sectors 


Because  n  (the  stream  length  in  the  streaming  model  or  the  window-width  in  the  sliding  window 
model)  is  large,  we  are  interested  in  sub-linear  space  algorithms,  especially  those  using  polylog(n) 
bits  of  space. 

Previous  work  in  the  streaming  model  addresses  computing  statistics  over  the  stream.  There 
are  streaming  algorithms  for  estimating  the  number  of  distinct  elements  in  a  stream  [8]  and  for 
approximating  frequency  moments  [2] .  Work  has  also  been  done  on  approximating  Lp  differences 
or  Lp  norms  of  data  streams  [7,  13].  There  are  also  algorithms  to  compute  histograms  for  the 
data  elements  in  the  streams  [10,  9].  Previous  work  in  the  sliding- window  model  [6]  addresses 
the  maintenance  of  the  sum  of  the  data  elements  in  the  window.  The  same  work  also  shows 
how  to  maintain  Lp  norms  in  the  window.  However,  aside  from  the  related  work  on  stream 
clustering  [11,  15],  little  is  known  about  computation-geometry  problems  in  the  streaming  or 
sliding-window  models. 

For  the  problem  of  computing  the  diameter  on  the  plane,  the  following  is  a  simple  algorithm 
that  uses  0( l/y/\e))  space  and  time.  Let  l  be  a  line  and  p,  q  6  P  be  two  points  that  realize 
the  diameter.  Denote  by  7p(p),  iriiq)  the  projection  of  p,  q  on  l.  Clearly,  if  the  angle  9  between 
l  and  the  line  pq  is  smaller  than  s/2e,  \Mp)Mq) I  >  \pq\cos9  >  (1  -  \)\pq\  >  (1  ~e)\pq\.  By 
using  a  set  of  lines  such  that  the  angle  between  pq  and  one  of  the  lines  is  smaller  than  s/2e,  the 
algorithm  can  approximate  the  diameter  with  bounded  error. 

The  algorithm  can  go  through  the  input  in  one  pass,  project  the  points  onto  each  line,  and 
maintain  the  extreme  points  for  the  lines.  Thus,  it  is  essentially  a  streaming  algorithm.  However, 
the  time  taken  per  point  is  proportional  to  the  number  of  lines  used,  which  is  fi(  1/y/e).  We 
present  an  almost  equally  simple  algorithm  that  removes  this  dependence  of  running  time  on  e. 


3  A  Sector-Based  Streaming  Diameter- Approximation  in  the 
Streaming  Model 


Our  basic  idea  is  to  divide  the  plane  into  sectors  and  compute  the  diameter  of  P  using  the 
information  in  each  sector.  Sectors  are  constructed  by  designating  a  point  xq  as  the  center  and 
dividing  the  plane  using  an  angle  of  0.  We  show  two  sectors  in  figure  1. 
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Algorithm  Streaming-Diameter 


1.  Take  the  first  point  of  the  stream  as  the  center,  and  divide  the  plane  into  sectors 
according  to  an  angle  9  =  2(i-e)  >  where  e  is  the  error  bound.  Let  S  be  the  set  of 
sectors. 

2.  While  going  through  the  stream,  for  each  sector,  record  the  point  in  that  sector 
that  is  the  furthest  from  the  center.  Also  keep  track  of  the  maximum  distance, 
Rc,  between  the  center  and  any  other  point  in  P. 

3.  Let  \ab\  be  the  distance  between  points  a  and  b.  Define  Dmax  =  max  |ui;|  for 
u  G  boundary  arc  of  sector  i  and  v  G  boundary  arc  of  sector  j,  and  define 

=  nrin  \  uv\  for  u  G  boundary  arc  of  sector  i  and  v  G  boundary  arc  of  sector 
j.  Output  max{i?c,  maxjjes  D*Ln}  as  the  diameter  of  the  point  set  P. 


The  sectors  have  outer  boundaries  (the  arcs  aa'  and  bb'  in  the  figure)  that  are  determined  by 
the  distance  between  the  center  and  the  farthest  point  from  the  center  in  that  sector.  The 
algorithm  records  the  farthest  point  for  each  sector  while  it  goes  through  the  input  stream.  The 
full  description  of  the  algorithm  is  given  in  algorithm  “Streaming-Diameter”.  The  algorithm’s 
space  complexity  is  determined  by  the  sector  angle  9. 


Claim  3.1  The  distance  between  any  two  points  in  sector  i  and  sector  j  is  no  larger  than 
max{l?c,  Dmax}.  ( Here  i  could  be  equal  to  j.) 


Proof.  Let  u  be  a  point  in  sector  i  and  v  be  a  point  in  sector  j .  Extend  xqu  until  it  reaches  the 
arc  aa' .  Denote  the  intersection  point  v! .  Also  extend  xqv  until  it  reaches  the  arc  bb' .  Denote 

the  intersection  point  v'.  Then  we  have  \uv\  <  max{|xou|,  \vu'\}  <  max{i?c,  |.to,u/|,  |t(V|}  < 

max{l?c,  Dmax}.  (In  the  two  inequalities  above  we  have  used  the  fact  that,  if  a,  b ,  c  occur  in  that 
order  on  a  line  and  d  is  some  point,  then  the  \db\  <  max(|da|,  |dc|).  ■ 

Claim  3.2  With  notation  as  in  Figure  1  and  in  the  description  of  the  algorithm,  Dmax  < 

DlPn  +  length(aa')  +  lengthfbb')  <  DlTn  +  2 Rc  ■  9  . 


Proof.  Let  \uv\  =  Dmax  and  \u'v'\  =  DlTn.  Because  u,u'  G  arc  aa'  and  v,v'  G  arc  bb' ,  There 
is  a  path  from  u  to  v,  namely  u  ~  v!  ~  v'  ~  v.  Therefore  Dmax  <  \uu'\  +  DlPn  +  \vv'\  < 


D'D  +  2 Rc  ■  9. 


Assume  that  the  true  diameter  diamtrue  is  the  distance  between  a  point  in  sector  i  and  another 
point  in  sector  j.  Let  diam  be  the  diameter  computed  by  our  algorithm.  We  observe  the 
following: 


max{l?c,  D'Tn}  <  rna x{Rc,  max  Df™n}  =  diam  <  diamtrue  <  nrax{i?c,  Dfjxax} 

m,n£S 


Depending  on  the  relationship  between  Rc  and  D'fun .  we  consider  two  cases:  In  the  case  where 
Rc  >  D lRn ,  we  want  Rc  >  (1  —  e)Dmax  in  order  to  bound  the  error.  This  leads  to  9  <  2ti~e)  •  -^n 
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same  on  this  side 


r\r~\  r\  r ■n 


c  to  tl  t2  t3 
Center 


t4 


Figure  2:  Rounding  Points  in  Each  Interval 


the  case  where  Rc  <  D^in,  we  want  D^in  >  (1  —  e)D^iax ■  Again,  this  leads  to  9  <  2<i-e)  •  We 
then  have  the  following  theorem. 

Theorem  3.3  There  is  an  algorithm  that  e- approximates  the  2-d  diameter  in  the  streaming 
model  using  storage  for  O(^)  points.  In  order  to  process  each  point,  it  takes  0(1)  time. 

The  above  algorithm  does  not  work  in  the  sliding-window  model.  In  the  streaming  model,  the 
boundaries  of  sectors  only  expand.  This  nice  property  allows  us  to  keep  only  the  extreme  points. 
However,  in  the  sliding-window  model,  the  diameter  may  decrease  with  different  windows.  One 
may  need  more  information  in  order  to  report  the  diameter  for  each  window.  In  next  section,  we 
give  a  deterministic  algorithm  that  maintains  an  approximation  to  the  diameter  in  the  sliding- 
window  model. 


4  maintaining  the  diameter  in  the  sliding-window  model 

First,  we  consider  maintaining  the  diameter  for  points  on  a  line.  In  the  sliding-window  model, 
each  point  has  an  age  indicating  its  location  in  the  current  window.  The  recently  arrived  points 
are  new  points,  and  the  expiring  points  are  old  points.  We  denote  by  \ab\  the  distance  between 
point  a  and  point  b.  We  also  say  that  the  distance  r  =  \ab\  is  realized  by  points  a  and  b.  We 
may  further  say  that  r  is  realized  by  a,  when  it  is  not  necessary  to  mention  b  or  b  is  clear  within 
the  context.  In  particular,  the  diameter  realized  by  a,  denoted  diama ,  is  the  maximum  distance 
realized  by  point  a  within  some  window. 

Given  three  points  a,  b,  c  and  an  approximation  error  of  e,  if  we  treat  point  c  as  a  center  (the 
coordinate  zero),  we  can  “round”  point  b  to  point  a  if  \ac\  <  \bc\  <  (1  +  e)|ac|.  Given  a  set  of 
points  in  the  window,  we  can  pick  some  point  as  the  center  and  round  the  other  points  in  the 
same  manner.  We  keep  the  following  invariant  in  rounding: 

Invariant  4.1  If  a  point  is  translocated2 in  a  “rounding,”  it  can  only  be  translo¬ 
cated  toward  the  center. 

Consider  the  distance  intervals  [c,  to);  [to ,  H),  [H,^),  •  •  • ,  [tk-i,tk],  such  that  c  is  the  center  and 
\cti\  =  (1  +  e)ld,  where  d  is  the  minimum  distance  between  the  center  and  any  other  point.  Each 
point  in  the  interval  [U,ti+ 1)  can  be  rounded  down  to  ti  (Figure  2). 


If  multiple  points  are  rounded  to  the  same  location,  we  can  discard  the  older  ones  and  only  keep 
the  newest  one.  We  will  then  have  at  most  one  point  in  each  of  these  intervals. 

2If  a  point  is  rounded  to  a  new  location,  we  say  that  the  point  is  “translocated” . 
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Let  D  be  the  diameter  of  a  set  of  points.  The  number  k  of  points  that  result  from  rounding  all 
the  points  in  this  set  will  be  bounded  as  follows: 


k  <  log1+e- 


D 


l°gD/d  <-log- 
log(l  +  e)  e  S  d 


We  call  the  set  of  points  that  results  from  rounding  a  cluster.  Note  that  the  points  in  a  cluster 
are  different  from  the  points  in  the  original  input  stream.  A  point  a  in  the  cluster  may  represent 
several  original  points  rounded  to  it.  (We  call  the  point  a  the  representative  point  of  these 
original  input  points  in  the  cluster.)  The  location  of  the  representative  point  in  the  cluster  may 
not  be  the  same  as  the  location  of  the  original  points.  If  2l  original  points  are  represented  by 
a  cluster,  the  cluster  is  said  to  be  at  level  l.  The  diameter  of  the  original  set  of  points  can  be 
approximated  by  the  diameter  of  the  cluster. 

With  this  scheme,  we  are  able  to  round  a  point,  say  b,  to  another  point,  say  a,  because  there 
is  some  distance  (for  example  \bc\)  realized  by  b  that  promises  a  lower  bound  for  any  diameter 
realized  by  b,  and  the  error  incurred  in  the  rounding  is  a  small  fraction  of  this  lower  bound.  In 
the  sliding-window  model,  the  point  c  may  be  aged  out  in  the  future,  making  the  approximation 
error  too  large.  Another  issue  is  whether  to  recenter  the  points  each  time  a  new  point  arrives. 
This  will  result  in  too  many  roundings  and  introduce  too  much  error  in  the  approximation  as 
well. 


To  overcome  these  problems,  we  maintain  multiple  clusters  each  of  which  has  the  following 
properties: 


1.  A  cluster  represents  an  interval  of  points  in  the  window  (the  set  of  points  within  a  time 
interval  of  the  window).  The  newest  point  in  the  interval  is  picked  to  be  the  center. 
The  other  points  in  the  interval  are  rounded.  The  resulting  points  forms  a  cluster  that 
represents  the  original  points  in  the  window  interval. 

2.  The  levels  of  the  clusters  are  integers. 

3.  We  allow  at  most  two  clusters  at  each  level. 

4.  When  the  number  of  clusters  at  level  i  exceeds  2,  the  oldest  two  clusters  (where  the  age  of 
a  cluster  is  determined  by  the  age  of  its  center)  at  that  level  are  merged  to  form  a  cluster 
at  level  i  +  1 . 


Imagine  a  tree  built  on  the  original  input  points  in  a  window.  The  points  are  the  leaves.  Two 
consecutive  points  can  form  a  node  (a  cluster)  at  level  1.  Two  consecutive  level-1  clusters  can 
merge  to  form  a  node  (a  cluster)  of  level  2.  This  can  be  repeated  recursively  until  we  reach  the 
top  level.  In  this  structure,  The  original  input  points  represented  by  a  cluster  are  the  leaves  of 
the  subtree  rooted  at  the  node  corresponding  to  that  cluster.  Note  that,  at  each  level,  we  only 
keep  at  most  2  nodes  (clusters).  The  original  input  points  represented  by  all  the  clusters  that 
we  keep  form  a  cover  of  the  window.  Thus,  the  whole  window  can  be  represented  by  O(logn) 
clusters.  Figure  3  shows  an  example  of  the  clusters  built  on  a  window. 

When  the  window  slides  forward,  new  points  are  added  to  the  window  and  new  clusters  are 
formed.  To  maintain  the  required  number  of  clusters  at  each  level,  clusters  are  merged  whenever 
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Figure  3:  Clusters  built  for  the  First  Window 


there  are  too  many  clusters  at  some  level.  Once  a  cluster  reaches  the  top  level,  it  stays  at  that 
level.  Points  in  this  cluster  will  ultimately  be  aged  out  until  the  whole  cluster  is  gone. 

In  order  to  merge  clusters  c\  centered  at  Ctr\  and  C2  centered  at  Ctr 2  to  form  cluster  C3,  we  go 
through  the  following  steps:  (We  can  assume,  w.l.o.g.  C2  is  newer  than  ci.), 

1.  Use  Ctr'2  as  the  center  of  newly  formed  cluster  C3. 

2.  Discard  the  points  in  c\  that  are  located  between  the  centers  of  ci  and  C2. 

3.  After  (2),  if  point  p  in  ci  satisfies  \pCtr2 \  <  \ pCtr\\  <  (1  +  e)\Ctr\Ctr2\,  discard  p. 

4.  Let  Pmerge  consists  of  the  remaining  points  of  ci  and  the  points  in  C2.  Round  points  in 
Pmerge ■  The  new  center  is  Ctr2,  and  the  new  value  of  d  is  the  minimum  distance  from 
Ctr'2  to  any  other  point  in  Pmerge ■  The  new  value  of  d  may  be  different  from  the  one  used 
in  building  the  cluster  C2.  We  may  need  to  round  the  points  in  cluster  C2. 

From  step  (4),  we  know  that  the  new  value  of  d  is  the  minimum  distance  between  Ctr2  and  any 
other  point  in  Pmerge ■  Let  pmin  be  this  minimum  distance  point.  If  pmm  belongs  to  cluster  ci, 
The  distance  \Ctr2Pmin\  maY  be  much  smaller  than  the  distance  between  the  point  Ctr 2  and 
the  original  point(s)  represented  by  pmin-  This  happens  because  when  the  points  are  rounded  to 
form  the  cluster  ci,  the  rounding  is  based  on  the  distance  between  these  points  and  the  center 
Ctr  1,  not  the  point  Ctr 2.  Thus  we  can’t  lower  bound  the  value  of  d  for  the  new  cluster  C3 
by  the  minimum  distance  between  its  center  and  any  other  original  point  whose  representative 
point  is  in  the  cluster.  However,  step  (2)  (3)  assures  that  | Ctrypmin |  is  at  least  e  •  \Ctr±Ctr2\. 
Otherwise,  pmtn  will  be  discarded.  We  know  that  the  two  points  Ctr\  and  Ctr2  are  at  their 
original  locations.  Thus,  d  is  bounded  by  e  times  the  minimum  distance  between  the  cluster 
center  and  any  other  original  points  whose  representative  point  is  in  the  cluster.  The  lower 
bound  for  the  whole  window  will  then  be  the  minimum  over  all  the  clusters. 

Define  a  boundary  point  in  a  cluster  to  be  an  extreme  point.  We  keep  track  of  the  boundary 
points  for  each  cluster  as  well  as  the  boundary  points  for  the  whole  window.  Points  may  expire 
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Algorithm  Sliding-Window  Diameter 


Update:  when  a  new  point  arrives: 

1.  Check  the  age  of  the  boundary  points  of  the  oldest  cluster.  If  one  of  them  has 
expired,  remove  it  and  update  the  boundary  point. 

2.  Make  the  newly  arrived  point  a  cluster  of  size  1.  Go  through  the  clusters  from 
most  recent  to  oldest  and  merge  clusters  whenever  necessary  according  to  the 
rules  stated  above.  Update  the  boundary  points  of  the  clusters  resulting  from 
merges. 

3.  Update  the  boundary  points  of  the  window  if  necessary. 

Query  Answer:  Report  the  distance  between  the  boundary  points  of  the  window  as 
the  window  diameter. 


from  the  oldest  cluster,  and  this  may  require  updating  the  boundary  points  of  this  cluster.  The 
whole  process  is  summarized  in  algorithm  “sliding- window  diameter” . 

Call  the  time  during  which  an  original  point  is  within  some  sliding  window  the  lifetime  of  that 
point.  Let’s  trace  a  point  p  through  its  life  time.  For  simplicity,  in  what  follows,  instead  of 
saying  that  the  original  point  p  is  represented  by  some  point  in  some  cluster,  we  will  just  say 
p  is  contained  or  included  in  that  cluster.  When  clusters  merge,  instead  of  saying  that  the 
representative  point  of  p  is  rounded  and  p  has  a  new  representative  point  in  the  new  cluster,  we 
will  just  say  p  is  rounded  (“translocated”)  and  has  a  new  location  now. 

Let  po  be  the  original  location  of  the  point  p  and  Ctro  be  the  center  of  the  first  cluster  that 
includes  the  point  p.  When  this  cluster  and  some  other  cluster  merge,  p  could  be  rounded  to 
a  new  location  p\ .  Let  Ctr\  be  the  center  of  the  newly  formed  cluster.  If  we  continue  this 
process,  before  p  expires  or  is  discarded,  we  will  have  a  sequence  of  p's  locations  p$,pi,  -  •  • , Pt 
and  corresponding  sequence  of  centers  Ctro,Ctri, . . .  ,Ctrt ■  We  observe  that  Ctri  and  Ctri+ 1 
will  be  on  the  same  side  of  p{.  Otherwise,  we  would  have  discarded  the  point  p. 

P,  p3  Ctr3  Ctr2 

O  O O  G  O  •  •  •  • 

Po  P2  P4  Ctrl  Ctr4 


Figure  4:  Point  may  be  translocated  in  each  rounding  but  all  the  translocations  are  towards  the 
same  direction. 

Claim  4.2  If  a  point  is  rounded  multiple  times  during  its  lifetime,  all  the  translocations 
because  of  rounding  are  in  the  same  direction(Figure  4 )■  hn  other  words,  for  all  the  rounded 
locations  pi  and  all  the  corresponding  centers  Ctri,  \poCtri\  >  \piCtri\. 


Proof.  Suppose  that  the  first  time  p  is  rounded,  it  is  rounded  to  the  right.  If  now  p  is  rounded 
to  the  left  for  the  first  time  on  step  i,  then  Ctri- 1  lies  to  the  right  of  p  while  Cti'i  lies  to  the 
left.  Further,  p  belonged  to  the  cluster  of  Ctri-i  before  the  merge.  Hence,  by  our  rules  it  would 
have  been  discarded,  not  rounded,  because  it  lies  between  the  two  centers,  and  it  belongs  to  the 
older  cluster.  ■ 

We  bound  the  error  in  the  rounding  process  by  showing  that,  for  all  i,  \poPi\  is  at  most  an  e 
fraction  of  the  diameter  realized  by  p. 

In  an  arbitrary  rounding  scheme,  with  multiple  roundings,  a  point  can  be  translocated  arbitrarily. 
The  distance  from  the  location  after  rounding  to  some  new  center  will  not  promise  a  lower  bound 
for  the  diameter  realized  by  the  point.  However,  with  our  rounding  scheme,  claim  4.2  guarantees 
the  following  invariant: 

Invariant  4.3  If  a  point  is  rounded  (even  multiple  times),  the  distance  between 
this  point  after  rounding  and  any  of  its  future  cluster  centers  is  at  most  the  distance 
of  any  diameter  realized  by  this  point. 

Each  time  we  round  a  point,  we  introduce  some  dislocation  or  error.  Let  errj+i  =  be 

the  dislocation  introduced  in  the  i  +  1th  merging.  Also,  let  diamp  be  the  diameter  realized  by 
p  in  some  window.  We  have  the  following  lemma: 

Lemma  4.4  The  total  rounding  error  of  point  p  before  it  is  discarded  or  expires  is  at  most 
elogn  •  diarrip . 

Proof.  In  each  rounding,  we  maintain  \piCtri+\\  <  (1  +  e)\pi+\Ctri+i\.  Thus  errj+i  = 
\PiPi+i\  <  (\Pi+iCtri+i\  <  e\poCtri+i\.  A  point  participates  in  at  most  logn  merges.  The  total 
amount  of  translocation  is  then  at  most  JA  err*  <  elogn  •  max,;  \p0Ctri\.  Also  our  Invariant  4.3 
states  that  diamp  >  rnaxj  \poCtri\.  u 

To  bound  the  error  by  |e,  we  make  e  <  jTogrir  The  number  of  points  in  a  cluster  after  rounding 
will  then  be  0(A  log  nlog  -?).  As  mentioned  above,  for  each  cluster,  d  is  bounded  by  e  times 
the  minimum  distance  between  the  center  of  the  cluster  and  any  other  original  point  whose 
representative  point  is  in  the  cluster.  Denote  by  R  the  maximum,  over  all  windows,  of  the 
ratio  of  the  diameter  to  the  minimum  non-zero  distance  between  any  two  original  points  in  that 
window.  Then  log  <  log  R  +  log  A  =  log  R  +  log  log  n  +  log  A.  The  number  of  points  in  a 
cluster  can  then  be  bounded  by  0(  A  log  n(log  R  +  log  log  n  +  log  A)). 

Theorem  4.5  There  is  an  e- approximation  algorithm  for  maintaining  diameter  in  one  di¬ 
mension  in  a  sliding  window  of  size  n,  using  0( A  log3  n(log  R  +  log  log  n  +  log  A))  bits  of  space, 
where  R  is  the  maximum,  over  all  windows,  of  the  ratio  of  the  diameter  to  the  minimum 
non-zero  distance  between  any  two  points  in  that  window.  The  algorithm  answers  the  diam¬ 
eter  query  in  0(1)  time.  Each  time  the  window  slides  forward,  the  algorithm  needs  a  worst 
case  time  of  0(A  log2  n(log  R  +  log  log  n  +  log  A))  to  process  the  incoming  point.  With  a  slight 
modification,  the  algorithm  can  process  incoming  points  with  O(logn)  amortized  time  using 
0(A  log2  n(log  n  +  log  log  R  +  log  A)  (log  R  +  log  log  n  +  log  A))  bits  of  space. 

Proof.  The  correctness  of  our  algorithm  is  clear  given  the  chosen  value  of  e  and  Lemma  4.4. 
We  now  analyze  the  time  and  space  requirement  of  our  algorithm.  For  each  cluster,  we  maintain 
the  following  information: 
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1.  The  exact  location  of  the  center  and  the  exact  location  of  the  point  closest  to  (but  not 
located  at)  the  center. 

2.  The  age  of  all  the  points. 

3.  The  relative  positions  of  all  the  points  other  than  the  center. 

The  relative  positions  of  all  the  point  in  a  cluster  can  be  encoded  by  a  bit  vector.  We  may  need 
log  n  bits  of  space  to  record  the  age  in  the  current  window  for  each  point.  Thus,  we  need  0(log  n) 
bits  for  each  cluster  point  except  the  center.  There  are  at  most  0(^  log  n(log  i?+log log  ro+log  ^)) 
points  in  each  cluster.  The  space  requirement  for  storing  the  information  in  items  (2)  and  (3) 
for  the  whole  cluster  is  then  0(j  log2  n(logi?  +  loglogn  +  log  ^)).  Because  we  assumed  that 
this  space  is  much  larger  than  the  space  required  to  store  two  points,  we  can  neglect  the  latter 
(the  space  for  information  in  item  (1)).  Given  that  there  are  O(logn)  clusters,  the  total  space 
requirement  will  be  0(^  log3  n(log  R  +  log  log  n  +  log  -))  to  maintain  the  diameter. 

In  order  to  report  the  diameter  at  any  time,  we  maintain  the  two  boundary  points  for  the  window 
while  we  maintain  the  clusters.  For  each  cluster,  we  only  need  to  look  at  its  boundary  points, 
and  thus  the  process  of  updating  the  sliding  window’s  boundary  points  will  only  cost  0(log  n) 
time. 

However,  while  updating  the  clusters,  we  may  face  a  sequence  of  cascading  merges.  In  the  worst 
case,  we  may  need  to  merge  0(log  n )  clusters  with  log  n(log  R  +  log  log  n  +  log  ^))  points  in 
each.  This  requires  time  0(-  log2  n(log  R  +  log  log  n  +  log  -)) . 

If  a  bit  vector  is  used  to  specify  the  relative  locations  of  the  points  in  a  cluster,  when  we  process 
the  cluster  during  merging  we  may  need  to  go  through  the  zero  entries  in  the  vector  .  This 
could  be  a  waste  of  time  if  the  vector  is  sparse.  We  can  directly  specify  the  relative  location  of 
a  point  instead.  Because  there  are  0(j  log  n(log  R  +  log  log  n  +  log  ^))  different  locations,  we 
need  an  additional  0(log  \  +  log  log  n  +  log  log  R)  bits,  besides  the  O(logn)  bits  stated  above, 
for  each  point  in  a  cluster.  The  space  requirement  for  each  point  in  a  cluster  will  then  be 
0(logn  +  log  log -R  +  logi).  With  this  modification,  when  merging  two  clusters,  we  are  free 
of  overhead  other  than  processing  the  points  in  the  clusters.  During  a  point’s  lifetime,  it  will 
take  part  in  at  most  log  n  merges,  thus,  a  simple  analysis  can  show  that  the  amortized  cost  for 
updating  is  now  only  O(logn).  ■ 

To  extend  the  algorithm  to  2-d,  we  can  apply  the  technique  discussed  at  the  beginning  of  the 
previous  section.  We  have  a  set  of  lines  and  project  the  points  in  the  plane  onto  the  lines. 
We  guarantee  that,  for  any  pair  of  points,  they  will  project  to  a  line  with  angle  0  such  that 
1  —  cos  6  <  |.  This  will  require  0(-(=)  lines.  We  then  use  our  diameter-maintenance  algorithm 
on  lines  to  maintain  the  diameter  in  the  2-d  case. 

Theorem  4.6  There  is  an  e- approximation  algorithm  for  maintaining  diameter  in  2-d  in  a 
sliding  window  of  size  n  using  0(^2  log3  n(log  R  +  log  log  n  +  log  ^))  bits  of  space,  where  R  is 
the  maximum,  over  all  windows,  of  the  ratio  of  the  diameter  to  the  minimum  non-zero  distance 
between  any  two  points  in  that  window. 
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Figure  5:  Reduction  from  DISJ  to  Diameter 

5  lower  bounds 

It  is  well-known  that  the  set-disjointness  problem  has  a  linear  communication  complexity  [14]  and 
thus  a  linear  space  lower  bound  in  the  streaming  model.  One  can  map  the  set  elements  to  points 
on  a  circle  such  that  the  diameter  of  the  circle  will  be  realized  if  and  only  if  the  corresponding 
element  is  presented  in  both  two  sets.  This  reduction  gives  the  following  theorem. 

Theorem  5.1  Any  streaming  algorithm  that  computes  the  exact  diameter  of  n  points,  even  if 
each  point  can  be  encoded  using  at  most  O(logn)  bits,  requires  Q(n)  bits  of  space. 

Proof.  We  reduce  the  set-disjointness  problem  to  a  diameter  problem.  The  set-disjointness 
problem  is  defined  as  follows:  Given  a  set  U  of  size  n  and  two  subsets  x  C  U  and  y  C  U,  the 
function  disj(x,y)  is  defined  to  be  “1”  when  x  fi  y  =  <j>  and  “0”  otherwise.  The  corresponding 
language  DISJ  is  the  set  {(x,  y)\x  C  U,  y  C  £/,  x  fl  y  =  <f}. 

The  set-disjointness  problem  has  a  linear  communication  complexity  lower  bound.  Because  a 
streaming  algorithm  can  be  easily  transferred  into  a  one-round  communication  protocol,  the  lin¬ 
ear  communication  complexity  lower  bound  gives  a  linear  space  lower  bound  for  set-disjointness 
problem  in  the  streaming  model. 

Consider  points  on  a  circle  in  the  plane.  For  a  given  point  pt ,  there  is  exactly  one  other  point 
on  the  circle  such  that  the  distance  between  it  and  pi  is  exactly  equal  to  the  diameter  of  the 
circle.  Denote  this  antipodal  point  pf  The  distance  between  pi  and  all  other  points  on  the 
circle  is  smaller  than  the  distance  between  p,  and  p\.  We  map  each  element  i  6  U  onto  one 
such  antipodal  pair.  We  further  make  the  appearance  of  one  point  in  the  pair  correspond  to  the 
appearance  of  the  element  i  in  subset  x  and  the  appearance  of  the  other  point  correspond  to 
the  element  i  in  y.  We  will  have  both  points  pt  and  p\  only  if  the  element  i  is  in  both  subsets  x 
and  y. 

Given  an  instance  (x,y)  of  DISJ,  we  construct  an  instance  of  the  diameter  problem  according 
to  the  above  principle.  We  give  an  example  in  Figure  5 

The  solid  squares  in  the  figure  are  the  points  we  put  into  the  diameter  instance.  The  DISJ 
instance  in  Figure  5  is  x,  y.  where  x  =  1011  and  y  =  1100.  The  diameter  instance  contains 
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pi ,  P3,  P4,  because  x  =  1011,  and  p\ ,  p'2,  because  y  =  1100.  The  dashed  circles  in  the  figure 
show  the  location  for  p2 ,  p'2 .  p\ ■  Because  x2  =  0  and  y%  =  j/4  =  0,  these  points  are  not  presented 
in  the  stream. 

In  the  example,  element  1  is  in  both  x  and  y.  The  diameter  of  the  point  set  constructed  is  \p \ p\  \ 
and  is  exactly  the  diameter  of  the  circle.  On  the  other  hand,  if  x  n  y  =  4>,  the  diameter  of  the 
point  set  will  be  strictly  smaller  than  the  diameter  of  the  circle.  Thus,  an  exact  algorithm  for 
the  diameter  problem  could  be  used  to  solve  the  set-disjointness  problem. 

In  the  above  construction,  in  order  to  distinguish  the  case  in  which  x  <1  y  =  from  the  case  in 
which  x  0  y  ^  <j>,  if  the  circle  has  diameter  “1,”  the  algorithm  must  distinguish  1  from  cos(t^). 
Because  1  —  cos(x)  >  ^x2  —  ^x4,  for  x  =  ^  and  large  n,  the  difference  of  must  be  detectable. 
This  means  that  the  encoding  of  each  point  must  have  precision  -ij,  which  can  be  achieved  using 
O(logn)  bits.  ■ 

In  the  sliding-window  case,  we  have  a  similar  bound  even  for  points  on  a  line.  Obviously  the 
lower  bound  holds  for  higher  dimensions  as  well. 

Theorem  5.2  To  maintain,  in  a  sliding  window  of  size  n,  the  exact  diameter  of  a  set  of 
points  on  a  line,  even  if  each  point  in  the  set  can  be  encoded  using  0(log  n)  bits,  requires  Ll(n) 
bits  of  space 

Proof.  Consider  a  family  &  of  point  sequences  of  length  2n— 2.  Each  sequence  a\ ,  a2,  ■  ■  ■ ,  a2n-2  € 
&  has  the  following  properties: 

1.  For  i  =  1,2, . . .  ,  n,  an+i-2  is  located  at  coordinate  zero.  The  coordinate  for  an-\  is  n. 

2.  |aian|  >  |a2an+i|  >  |a3an+2|  >•••,>  \an-ia2n-2\ 

3.  The  coordinates  of  the  points  aj,  for  j  =  1,2, . . .  ,n  —  2,  have  the  form  n  ■  k  for  some 
k  G  2, 3, . . . ,  n. 

For  a  window  that  ends  at  point  as,  the  diameter  is  exactly  the  distance  |asas+n_i|.  Any  two 
members  of  the  family  will  have  different  diameters  for  a  window  that  ends  at  as,  for  some 
s  €  1, 2, . . . ,  n  —  2,  where  the  coordinates  of  as  differ  in  the  two  sequences.  Thus,  an  algorithm 
that  maintains  the  diameter  exactly  has  to  distinguish  any  two  sequences  in  & . 

By  Property  (3),  the  number  of  member  sequences  in  &  is  (n_2^”_1)  >  (1.5)n/2,  for  n  sufficiently 
large.  (The  number  of  member  sequences  in  &  is  in  one-to-one  correspondence  with  sequences 
of  0’s  and  l’s  containing  n  —  2  0’s  and  n  —  1  l’s.)  The  algorithm  thus  needs  log  |J^|  =  fl(ra) 
space.  ■ 

Note  that,  in  this  family  J^",  the  ratio  R  is  just  n.  If  we  change  the  form  of  the  coordinates  of 
aj  for  j  =  1,2,...  ,?x  —  2  to  (1  +  e)ik  while  respecting  the  Property  (2)  above,  a  similar  family 
of  points  sequences  can  be  constructed  for  e-approximation  algorithms.  We  have  the  following 
lower  bounds  for  approximation  from  this  modified  family  of  points  sequences. 

Theorem  5.3  Let  R  be  the  maximum,  over  all  windows,  of  the  ratio  of  the  diameter  to 
the  minimum  non-zero  distance  between  any  two  points  in  that  window.  To  e- approximately 
maintain  the  diameter  of  points  on  a  line  in  a  sliding  window  of  size  n  requires  11(|  log  itTog  n) 
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bits  of  space  if  log  R  <  ^e-n1  s,  for  some  constant  5  <  1.  The  approximation  requires  Q(ri)  bits 
of  space  if  log  R>  |e  •  n. 

Proof  Once  again  consider  the  family  of  point  sequences  in  the  proof  of  Theorem  5.2.  We 
make  the  following  change:  Keep  the  points  an, . . . ,  a2n-i  at  coordinate  zero,  but  move  the 
point  an- 1  to  coordinate  1.  The  coordinates  of  the  points  Uj  for  j  =  1,2, ...  ,n  —  2,  have  the 
form  (1  +  e)3fc,  for  some  k  €  {1,2, ...  ,  |log/1+e)  R  =  m}.  These  coordinates  are  chosen  so  as 
to  respect  Property  (2)  in  the  proof  of  Theorem  5.2.  Note  that  ^logi?  >  m  >  ^log  R,  for  e 
sufficiently  small,  because  e/2  <  log(l  +  e)  <  e.  Depending  on  the  value  of  logR,  we  consider 
two  cases: 

1.  log-R  <  |e  •  for  some  constant  6  <  1.  By  a  similar  argument  to  the  one  given  in  the 
proof  of  Theorem  5.2,  the  space  requirement  will  now  be  lower  bounded  by: 

log  (n  +  rn~1\  >  m  log  —  >  —  log  R(5  log  n) 

\  m  J  m  3e 

=  D(-logRlogn) 

2.  logR  >  |e  •  n.  In  this  case,  m  >  We  can  always  choose  ^  distinct  values  for  the 
coordinates  of  points  a i, ,  an-2-  The  space  requirement  will  be  lower  bounded  by 

,  /n  +  n/2  — l\  n  ,  . 

‘0g(  n/2  )-2  'loe2  =  !!(«) 


6  Future  Work 

In  this  paper,  we  have  initiated  the  study  of  computational-geometry  problems  in  the  streaming 
and  sliding-window  models  and  have  provided  bounds  for  approximate  and  exact  diameter  com¬ 
putation  in  these  models.  Massive  streamed  data  sets  for  computational-geometry  problems  arise 
naturally  as  problems  in  areas  such  as  information  retrieval  and  pattern  recognition  are  modeled 
as  computational-geometry  problems  by  means  of  an  embedding  into  a  metric  space.  Thus,  we 
believe  that  the  study  of  stream  algorithms  for  basic  problems  in  computational-geometry  is  a 
promising  direction  for  future  research. 
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